Discovering General Multidimensional Associations
نویسندگان
چکیده
When two variables are related by a known function, the coefficient of determination (denoted R2) measures the proportion of the total variance in the observations explained by that function. For linear relationships, this is equal to the square of the correlation coefficient, ρ. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably--assigning similar values to equally noisy relationships. Here we demonstrate how to directly estimate a generalised R2 when the form of the relationship is unknown, and we consider the performance of the Maximal Information Coefficient (MIC)--a recently proposed information theoretic measure of dependence. We show that our approach behaves equitably, has more power than MIC to detect association between variables, and converges faster with increasing sample size. Most importantly, our approach generalises to higher dimensions, estimating the strength of multivariate relationships (Y against A, B, …) as well as measuring association while controlling for covariates (Y against X controlling for C). An R package named matie ("Measuring Association and Testing Independence Efficiently") is available (http://cran.r-project.org/web/packages/matie/).
منابع مشابه
Discovering Concept-Level Event Associations from a Text Stream
We study an open text mining problem – discovering concept-level event associations from a text stream. We investigate the importance and challenge of this task and propose a novel solution by using event sequential patterns. The proposed approach can discover important event associations implicitly expressed. The discovered event associations are general and useful as knowledge for application...
متن کاملA General Survey on Multidimensional And Quantitative Association Rule Mining Algorithms
Data mining is one of the significant topics of research in recent years. Association rule is a method for discovering interesting relations between variables in large databases. Support and Confidence are the two basic parameters used to study the threshold values for each database. In this paper, an overall survey of the algorithms implementing the multidimensional and quantitative data is pr...
متن کاملAn extended multidimensional Hardy-Hilbert-type inequality with a general homogeneous kernel
In this paper, by the use of the weight coefficients, the transfer formula and the technique of real analysis, an extended multidimensional Hardy-Hilbert-type inequality with a general homogeneous kernel and a best possible constant factor is given. Moreover, the equivalent forms, the operator expressions and a few examples are considered.
متن کاملEecient Parallel Algorithms for Mining Associations ? Parallel Algorithms for Discovering Associations
The problem of mining hidden associations present in the large amounts of data has seen widespread applications in many practical domains such as customer-oriented planning and marketing, telecom-munication network monitoring, and analyzing data from scientiic experiments. The combinatorial complexity of the problem has fascinated many researchers. Many elegant techniques, such as Apriori, have...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 11 شماره
صفحات -
تاریخ انتشار 2016